COMETS Analytics support all cohort-specific analyses of the COMETS consortium. This collaborative work is done via the COMETS harmonization group activities. For more information, see the [COMETS website] (http://epi.grants.cancer.gov/comets/).
The required input file shoud be in excel format with the following 6 sheets:
An example input file is available HERE.
The first step of the analysis is to load in the data with the readCOMETSinput() function. Input for this function is an Excel spreadsheet, per the description above.
# Retrieve the full path of the input data
dir <- system.file("extdata", package="COMETS", mustWork=TRUE)
csvfile <- file.path(dir, "cometsInputAge.xlsx")
# Read in and process the input data
exmetabdata <- COMETS::readCOMETSinput(csvfile)## Registered S3 method overwritten by 'seriation':
## method from
## reorder.hclust gclus
## [1] "Metabolites sheet is read in"
## [1] "SubjectMetabolites sheet is read in"
## [1] "SubjectData sheet is read in"
## [1] "VarMap sheet is read in"
## [1] "Models sheet is read in"
## [1] "Options sheet is read in"
## [1] "There are 14 categorical variables"
## [1] "Running Integrity Check..."
## Joining, by = "id"
## Joining, by = "hmdb_id"
## [1] "Input data has passed QC (metabolite and sample names match in all input files)"
To plot some the distribution of variances for each metabolite:
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.
To plot the distribution of minimum/missing values:
## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.
There are 2 ways to specify your model, batch or interactive. In Batch mode, models are specified in your input file Models sheet. The model information needs to be read in with the function getModelData() and processed so the software knows which models to run. The following call defines the “1 Gender adjusted” model from the Models sheet in the input file to be run.
In Interactive mode, models are specified as parameters. The model information needs to be read in with the function getModelData() and processed so the software knows which models to run.
The following call defines the model with age and bmi_grp as the exposure variables, and includes only the subjects with age > 40 and bmi_grp > 2.
exmodeldata2 <- COMETS::getModelData(exmetabdata, modelspec="Interactive",
exposures=c("age","bmi_grp"), where=c("age>40","bmi_grp>2"))## [1] "Filtering subjects according to the rule(s) age>40 & bmi_grp>2 . 279 of 1000 are retained."
The runModel() function is the main function for running a single model, and by default, a correlation analysis is performed.
The output of the correlation analysis can then be compiled and output to an Excel file with the following function:
To view the first 3 lines of the correlation analysis output, simply type:
##
## ModelSummary:
## run cohort spec model outcomespec exposurespec nobs
## 1 1 DPP Interactive _1_2_3_benzenetriol_sulfate_2 age 279
## 2 2 DPP Interactive _1_2_dipalmitoylglycerol age 279
## 3 3 DPP Interactive _1_2_propanediol age 279
## message adjvars adjvars.removed adjspec outcome_uid
## 1 CHEM100006374
## 2 HMDB07098
## 3 HMDB01881
## outcome exposure_uid adj_uid
## 1 1,2,3-benzenetriol sulfate (2) age
## 2 DG(16:0/16:0/0:0) age
## 3 Propylene glycol age
##
## Effects:
## run outcomespec exposurespec term corr p.value
## 1 1 _1_2_3_benzenetriol_sulfate_2 age age 0.164624501 0.005846722
## 2 2 _1_2_dipalmitoylglycerol age age 0.068903188 0.251337451
## 3 3 _1_2_propanediol age age 0.001667259 0.977882521
## NULL
To display the heatmap of the resulting correlation matrix, use the showheatmap function.
exmodeldata<-COMETS::getModelData(exmetabdata,modelspec = "Interactive",exposures = c("bmi_grp","age"))
excorrdata <- COMETS::runModel(exmodeldata,exmetabdata,"DPP")
COMETS::showHClust(excorrdata, showticklabels=FALSE)Results can be written to an output Excel file with the following command:
A stratified correlation analysis can be performed by specifiying stratification variables in the call to getModelData(). If more than one stratification variable is specified, then the strata will be defined by all unique combinations of the stratification variables. The following call will define a model stratified by race_grp.
exmodeldata2 <- COMETS::getModelData(exmetabdata,modelspec="Interactive",
outcomes=c("lactose","lactate"),
exposures=c("age","bmi_grp"),strvars="race_grp")The stratified correlation analysis is run by calling the runModel() function.
Call getModelData() to define a model which adjusts for age group, has lactose and lactate as outcome variables, and has age and bmi group as the exposure variables.
exmodeldata <- COMETS::getModelData(exmetabdata,modelspec="Interactive", adjvars="age_grp",
outcomes=c("lactose","lactate"), exposures=c("age","bmi_grp"))To run a linear regression using the lm function, a list of options must be passed into runModel() with the model option set to “lm”.
lm_results <- COMETS::runModel(exmodeldata, exmetabdata, "DPP", op=list(model="lm"))
print(lm_results)## $ModelSummary
## run cohort spec model outcomespec exposurespec wald.pvalue r.squared
## 1 1 DPP Interactive lactose age 0.16451267 0.01467966
## 2 2 DPP Interactive lactate age 0.60149903 0.00165591
## 3 3 DPP Interactive lactose bmi_grp 0.02387982 0.02206445
## 4 4 DPP Interactive lactate bmi_grp 0.00166282 0.01641147
## adj.r.squared sigma loglik aic bic deviance df.residual
## 1 0.01171183 1.8029707 -2006.3702 4022.7404 4047.2792 3237.70036 996
## 2 -0.00135115 0.2882131 -172.8794 355.7588 380.2975 82.73452 996
## 3 0.01714526 1.7980076 -2002.6087 4019.2173 4053.5716 3213.43441 994
## 4 0.01146384 0.2863629 -165.4342 344.8684 379.2227 81.51171 994
## nobs message adjvars adjvars.removed adjspec outcome_uid
## 1 1000 age_grp.2;age_grp.3 age_grp HMDB00186
## 2 1000 age_grp.2;age_grp.3 age_grp HMDB00190
## 3 1000 age_grp.2;age_grp.3 age_grp HMDB00186
## 4 1000 age_grp.2;age_grp.3 age_grp HMDB00190
## outcome exposure_uid adj_uid
## 1 Alpha-Lactose age age_grp.2;age_grp.3
## 2 L-Lactic acid age age_grp.2;age_grp.3
## 3 Alpha-Lactose bmi_grp age_grp.2;age_grp.3
## 4 L-Lactic acid bmi_grp age_grp.2;age_grp.3
##
## $Effects
## run outcomespec exposurespec term estimate std.error statistic
## 1 1 lactose age age 0.034697534 0.024961296 1.3900534
## 2 2 lactate age age -0.002083854 0.003990177 -0.5222460
## 3 3 lactose bmi_grp bmi_grp.2 -0.047905160 0.134163189 -0.3570663
## 4 3 lactose bmi_grp bmi_grp.3 0.360610471 0.145954081 2.4707118
## 5 3 lactose bmi_grp bmi_grp.4 0.420224133 0.577141138 0.7281133
## 6 4 lactate bmi_grp bmi_grp.2 0.034207637 0.021367743 1.6009008
## 7 4 lactate bmi_grp bmi_grp.3 0.080743228 0.023245640 3.4734783
## 8 4 lactate bmi_grp bmi_grp.4 -0.126241151 0.091919426 -1.3733892
## p.value
## 1 0.1648232433
## 2 0.6016151626
## 3 0.7211179261
## 4 0.0136512756
## 5 0.4667157218
## 6 0.1097166272
## 7 0.0005359045
## 8 0.1699410577
##
## attr(,"ptime")
## [1] "Processing time: 0.23 sec"
Run a linear regression using the glm function for the same variables as above. The default family used with glm is “gaussian”, which corresponds to a linear regression. The Effects data frame will be the same as with lm, but the ModelSummary data frame will contain some different columns.
glm_results <- COMETS::runModel(exmodeldata, exmetabdata, "DPP", op=list(model="glm"))
print(all.equal(lm_results$Effects, glm_results$Effects))## [1] TRUE
## run cohort spec model outcomespec exposurespec converged wald.pvalue
## 1 1 DPP Interactive lactose age 1 0.16451267
## 2 2 DPP Interactive lactate age 1 0.60149903
## 3 3 DPP Interactive lactose bmi_grp 1 0.02387982
## 4 4 DPP Interactive lactate bmi_grp 1 0.00166282
## null.deviance df.null loglik aic bic deviance df.residual
## 1 3285.93681 999 -2006.3702 4022.7404 4047.2792 3237.70036 996
## 2 82.87175 999 -172.8794 355.7588 380.2975 82.73452 996
## 3 3285.93681 999 -2002.6087 4019.2173 4053.5716 3213.43441 994
## 4 82.87175 999 -165.4342 344.8684 379.2227 81.51171 994
## nobs message adjvars adjvars.removed adjspec outcome_uid
## 1 1000 age_grp.2;age_grp.3 age_grp HMDB00186
## 2 1000 age_grp.2;age_grp.3 age_grp HMDB00190
## 3 1000 age_grp.2;age_grp.3 age_grp HMDB00186
## 4 1000 age_grp.2;age_grp.3 age_grp HMDB00190
## outcome exposure_uid adj_uid
## 1 Alpha-Lactose age age_grp.2;age_grp.3
## 2 L-Lactic acid age age_grp.2;age_grp.3
## 3 Alpha-Lactose bmi_grp age_grp.2;age_grp.3
## 4 L-Lactic acid bmi_grp age_grp.2;age_grp.3
Call getModelData() to define a model which adjusts for age group, has nested_case as the outcome variable, and has lactose and lactate as the exposure variables. The variable nested_case must be a binary 0-1 variable.
exmodeldata <- COMETS::getModelData(exmetabdata,modelspec="Interactive", adjvars="age_grp",
outcomes="nested_case", exposures=c("lactose","lactate"))To run a logistic regression, the list of options op must also include a model.options list with family set to “binomial”.
op <- list(model="glm", model.options=list(family="binomial"))
glm_results <- COMETS::runModel(exmodeldata, exmetabdata, "DPP", op=op)
print(glm_results)## $ModelSummary
## run cohort spec model outcomespec exposurespec converged wald.pvalue
## 1 1 DPP Interactive nested_case lactose 1 0.51856182
## 2 2 DPP Interactive nested_case lactate 1 0.01003264
## null.deviance df.null loglik aic bic deviance df.residual nobs
## 1 1386.278 999 -692.5559 1393.112 1412.743 1385.112 996 1000
## 2 1386.278 999 -689.4160 1386.832 1406.463 1378.832 996 1000
## message adjvars adjvars.removed adjspec outcome_uid outcome
## 1 age_grp.2;age_grp.3 age_grp nested_case nested_case
## 2 age_grp.2;age_grp.3 age_grp nested_case nested_case
## exposure_uid adj_uid
## 1 HMDB00186 age_grp.2;age_grp.3
## 2 HMDB00190 age_grp.2;age_grp.3
##
## $Effects
## run outcomespec exposurespec term estimate std.error statistic
## 1 1 nested_case lactose lactose 0.02268456 0.03513914 0.6455639
## 2 2 nested_case lactate lactate 0.57214513 0.22221799 2.5747022
## p.value
## 1 0.51856182
## 2 0.01003264
##
## attr(,"ptime")
## [1] "Processing time: 0.13 sec"
All models desginated in the input file can be run with one command, and individual output Excel files or correlation results will be written in the current directory by default. The function returns a list of objects.
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.4-1 ellipsis_0.3.1 class_7.3-17
## [4] rprojroot_1.3-2 corpcor_1.6.9 fs_1.4.2
## [7] rstudioapi_0.11 farver_2.0.3 remotes_2.2.0
## [10] subselect_0.15.2 prodlim_2019.11.13 fansi_0.4.1
## [13] lubridate_1.7.9 codetools_0.2-16 splines_4.0.2
## [16] mnormt_2.0.1 knitr_1.29 pkgload_1.1.0
## [19] jsonlite_1.7.0 pROC_1.16.2 caret_6.0-86
## [22] broom_0.7.0 cluster_2.1.0 compiler_4.0.2
## [25] httr_1.4.1 backports_1.1.7 assertthat_0.2.1
## [28] Matrix_1.2-18 lazyeval_0.2.2 cli_2.0.2
## [31] htmltools_0.5.0 prettyunits_1.1.1 tools_4.0.2
## [34] gtable_0.3.0 glue_1.4.1 reshape2_1.4.4
## [37] dplyr_1.0.0 Rcpp_1.0.5 cellranger_1.1.0
## [40] vctrs_0.3.1 gdata_2.18.0 nlme_3.1-148
## [43] iterators_1.0.12 crosstalk_1.1.0.1 psych_1.9.12.31
## [46] timeDate_3043.102 gower_0.2.2 xfun_0.19
## [49] stringr_1.4.0 ps_1.3.3 testthat_2.3.2
## [52] lifecycle_0.2.0 gtools_3.8.2 devtools_2.3.1
## [55] dendextend_1.14.0 MASS_7.3-51.6 scales_1.1.1
## [58] ipred_0.9-9 TSP_1.1-10 parallel_4.0.2
## [61] RColorBrewer_1.1-2 yaml_2.2.1 memoise_1.1.0
## [64] heatmaply_1.1.1 gridExtra_2.3 ggplot2_3.3.2
## [67] rpart_4.1-15 stringi_1.4.6 gclus_1.3.2
## [70] desc_1.2.0 foreach_1.5.0 seriation_1.2-8
## [73] caTools_1.18.0 pkgbuild_1.0.8 lava_1.6.7
## [76] rlang_0.4.6 pkgconfig_2.0.3 bitops_1.0-6
## [79] evaluate_0.14 lattice_0.20-41 purrr_0.3.4
## [82] labeling_0.3 recipes_0.1.13 htmlwidgets_1.5.1
## [85] processx_3.4.3 tidyselect_1.1.0 plyr_1.8.6
## [88] magrittr_1.5 R6_2.4.1 gplots_3.0.4
## [91] generics_0.0.2 COMETS_1.5.0.0 pillar_1.4.6
## [94] withr_2.2.0 survival_3.1-12 nnet_7.3-14
## [97] tibble_3.0.3 crayon_1.3.4 KernSmooth_2.23-17
## [100] tmvnsim_1.0-2 plotly_4.9.2.1 rmarkdown_2.5
## [103] viridis_0.5.1 usethis_1.6.1 grid_4.0.2
## [106] readxl_1.3.1 data.table_1.12.8 callr_3.4.3
## [109] ModelMetrics_1.2.2.2 digest_0.6.25 webshot_0.5.2
## [112] tidyr_1.1.1 ISwR_2.0-8 stats4_4.0.2
## [115] munsell_0.5.0 registry_0.5-1 viridisLite_0.3.0
## [118] sessioninfo_1.1.1